SheetReader: Efficient Specialized Spreadsheet Parsing
نویسندگان
چکیده
Spreadsheets are widely used for data exploration. Since spreadsheet systems have limited capabilities, users often need to load spreadsheets other science environments perform advanced analytics. However, current approaches loading suffer from either high runtime or memory usage, which hinders exploration on commodity systems. To make practical systems, we introduce a novel parser that minimizes usage by tightly coupling decompression and parsing. Furthermore, reduce the runtime, optimized spreadsheet-specific parsing routines employ parallelism. evaluate our approach, implement prototypes Excel into R Python environments. Our evaluation shows approach is up 3× faster while consuming 40× less than state-of-the-art approaches. The source code available at https://github.com/fhenz/SheetReader-r.
منابع مشابه
Shallow Parsing using Specialized HMMs
We present a unified technique to solve different shallow parsing tasks as a tagging problem using a Hidden Markov Model-based approach (HMM). This technique consists of the incorporation of the relevant information for each task into the models. To do this, the training corpus is transformed to take into account this information. In this way, no change is necessary for either the training or t...
متن کاملParsing Speech Repair without Specialized Grammar Symbols
This paper describes a parsing model for speech with repairs that makes a clear separation between linguistically meaningful symbols in the grammar and operations specific to speech repair in the operation of the parser. This system builds a model of how unfinished constituents in speech repairs are likely to finish, and finishes them probabilistically with placeholder structure. These modified...
متن کاملEfficient Transformation-Based Parsing
In transformation-based parsing, a finite sequence of tree rewriting rules are checked for application to an input structure. Since in practice only a small percentage of rules are applied to any particular structure, the naive parsing algorithm is rather inefficient. We exploit this sparseness in rule applications to derive an algorithm two to three orders of magnitude faster than the standard...
متن کاملEfficient Bottom-Up Parsing
This paper describes a series of experiments aimed at producing a bot tom-up parser that will produce partial parses suitable for use in robust interpretation and still be reasonably efficient. In the course of these experiments, we improved parse times by a factor of 18 over our first a t tempt, ending with a system that was twice as fast as our previous parser, which relied on strong top-down...
متن کاملLearning Efficient Parsing
A corpus-based technique is described to improve the efficiency of wide-coverage high-accuracy parsers. By keeping track of the derivation steps which lead to the best parse for a very large collection of sentences, the parser learns which parse steps can be filtered without significant loss in parsing accuracy, but with an important increase in parsing efficiency. An interesting characteristic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Systems
سال: 2023
ISSN: ['0306-4379', '1873-6076']
DOI: https://doi.org/10.1016/j.is.2023.102183